Automatic Diachronic Normalization of Polish Texts

نویسندگان
چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Term Recognition in Polish Texts

Although ATR has been in the research focus for over a decade now, most approaches have been developed for highly positional languages, whereas only a few efforts have been made for Slavic languages which have a richer morphological inflection and a more relaxed word order, e.g., Vintar (2004) (for Slovene) and Nenadic et al. (2003) (for Serbian). In this paper, we report on our experiments in ...

متن کامل

Using Comparable Collections of Historical Texts for Building a Diachronic Dictionary for Spelling Normalization

In this paper, we argue that comparable collections of historical written resources can help overcoming typical challenges posed by heritage texts enhancing spelling normalization, POS-tagging and subsequent diachronic linguistic analyses. Thus, we present a comparable corpus of historical German recipes and show how such a comparable text collection together with the application of innovative ...

متن کامل

Measuring Readability of Polish Texts: Baseline Experiments

Measuring readability of a text is the first sensible step to its simplification. In this paper we present an overview of the most common approaches to automatic measuring of readability. Of the described ones, we implemented and evaluated: Gunning FOG index, Flesch-based Pisarek method. We also present two other approaches. The first one is based on measuring distributional lexical similarity ...

متن کامل

Terminology extraction from medical texts in Polish

BACKGROUND Hospital documents contain free text describing the most important facts relating to patients and their illnesses. These documents are written in specific language containing medical terminology related to hospital treatment. Their automatic processing can help in verifying the consistency of hospital documentation and obtaining statistical data. To perform this task we need informat...

متن کامل

Rule-Based Normalization of Historical Texts

This paper deals with normalization of language data from Early New High German. We describe an unsupervised, rulebased approach which maps historical wordforms to modern wordforms. Rules are specified in the form of context-aware rewrite rules that apply to sequences of characters. They are derived from two aligned versions of the Luther bible and weighted according to their frequency. The eva...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Investigationes Linguisticae

سال: 2018

ISSN: 1426-188X

DOI: 10.14746/il.2017.37.2.